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Abstract —As an important information for traffic condition 
evaluation, trip planning, transportation management, etc., 
average travel speed for a road means the average speed of 
vehicles travelling through this road in a given time duration. 
Traditional ways for collecting travel-speed oriented traffic 
data always depend on dedicated sensors and supporting 
infrastructures, and are therefore financial costly. Differently, 
vehicular crowdsensing as an infrastructure-free way, can 
be used to collect data including real-time locations and 
velocities of vehicles for road travel speed estimation, which 
is a quite low-cost way. However, vehicular crowdsensing 
data is always coarse-grained. This coarseness can lead to 
the incompleteness of travel speeds. Aiming to handle this 
problem as well as estimate travel speed accurately, in this 
paper, we propose an approach named STC that exploits 
the spatial-temporal correlation among travel speeds for roads 
by introducing the time-lagged cross correlation function. 
The time lagging factor describes the time consumption of 
traffic feature diffusion along roads. To properly calculate 
cross correlation, we novelly make the determination of the 
time lagging factor self-adaptive by recording the locations of 
vehicles at different roads. Then, utilizing the local stationarity 
of cross correlation, we further reduce the problem of single¬ 
road travel speed vacancy completion to a minimization 
problem. Finally, we fill all the vacancies of travel speed for 
roads in a recursive way using the geometric structure of road 
net. Elaborate experiments based on real taxi trace data show 
that STC can settle the incompleteness problem of vehicle 
crowdsensing data based travel speed estimation and ensure 
the accuracy of estimated travel speed better, in comparison 
with representative existing methods such as KNN, Kriging 
and ARIMA. 

Index Terms —Travel Speed Estimation, Vehicular Crowd¬ 
sensing, Spatial-Temporal Correlation. 

1. Introduction 

Average travel speed for a road means the average speed 
of vehicles travelling through this road in a given time 
duration. Such information can be used for traffic condition 
evaluation, trip planning, transportation management, and 
so on. Providing services based on average travel speed is 
very helpful both for drivers and traffic controllers. To get 
travel speed information, in traditional ways, the road net 
is monitored by infrastructure-supported devices such as 
cameras [1], [2], loop detectors [3], [4] and radio sensors 
[5] which are very expensive to deploy and maintain [6]. 
What’s more, these ways can be affected by environment. 
For example, quality of camera-based data is vulnerable to 
weather condition, light condition, and so on. 

Different from infrastructure-based ways, vehicular 
crowdsensing is an up-to-the-moment infrastructure-free 
way for traffic data collection. Vehicles running on roads 
can incidentally offer their data including timestamps, lo¬ 
cations and real-time speeds to a central server via wireless 


communication. The feasibility of vehicular crowdsensing 
is guaranteed by the wide deployment of on-board GPS 
and wireless communication devices such as navigation 
facilities. Since it almost needs no additional equipment, 
the cost of vehicular crowdsensing is greatly low when 
compared with infrastructure-based methods. We can use 
vehicular crowdsensing data to estimate travel speeds for 
roads. However, vehicular crowdsensing data is usually 
coarse-grained. The first reason is that not all the vehicles 
running on roads will be willing to offer data. The second 
reason is the temporal variation of spatial distribution of 
vehicles. Take it as an example, in morning rush hour, 
most roads can be covered by vehicles while much less 
roads can be covered in midnight. The coarseness of 
vehicular crowdsensing data will lead to the incompleteness 
problem of travel speeds of roads, which can be testified by 
analysis of real taxi probe data. In addition, to guarantee 
the usability of travel speeds, the estimated values have to 
meet predefined accuracy requirement. 

In order to handle the incompleteness problem of vehic¬ 
ular crowdsensing data based travel speed estimation and to 
calculate travel speed accurately, in this paper, we propose 
a spatial-temporal correlation based approach named STC 
(Spatial Temporal Correlation). In STC, we leverage the 
spatial-temporal correlation of travel speeds among roads. 
Our basic argument is that the travel speeds of roads close 
to each other are correlated due to spatial proximity of 
these roads. More specifically, we use the time-lagged cross 
correlation function to measure the relevances between 
travel speeds of different roads for different calculation in¬ 
tervals. Note that due to the time consumption for vehicles 
to run from road ri to r 2 , within a relative small area, 
the farther the distance between two roads is, the longer 
the time taken by a traffic characteristic to spread from 
ri to r 2 will be. We call this time consumption as time 
lagging factor. Particularly, we employ the traceability of 
vehicle locations to determine the time lagging factor self- 
adaptively, which is helpful to properly calculate the cross 
correlation between roads. Having the cross correlation, we 
fill the vacancy of single-road travel speed with the help of 
the near-by roads whose travel speeds are not empty. We 
accomplish this by converting it to a minimization problem, 
using the local stationarity of cross correlation between 
roads. Finally, we fill all the travel speed vacancies of roads 
in a recursive way by utilizing the geometric structure of 
road net. 

To examine the effectiveness of our approach, we con¬ 
duct experiments based on real taxi trace data including 
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8363 taxis, collected in Shenzhen, China during 18-26, 
April, 2011. Through cross-validation, we show that our 
approach can handle the incompleteness of travel speeds 
and guarantee the accuracy of travel speed calculation 
better, compared with classical estimation methods such as 
KNN, Kriging and ARIMA. 

The main contributions of this paper are as follows: 

•We propose a spatial-temporal correlation based ap¬ 
proach named STC to estimate travel speed based on 
coarse-grained vehicular crowdsensing data, in which we 
use time-lagged cross correlation function to measure the 
correlation among roads in terms of travel speeds. 

•To properly calculate cross correlation, we novelly 
make the determination of time lagging factor self-adaptive 
by leveraging the traceability of vehicle locations at up¬ 
stream and downstream roads. 

•We reduce the problem of single-road travel speed 
vacancy completion to a minimization problem by using 
the local-stationarity of cross correlation. Then by utilizing 
the topology of road net, we use a recursive way to fill all 
the travel speed vacancies of roads. 

The rest of this paper is organized as follows. In Section 
II, we make a review of existing works that related to ours. 
In Section III, we formulate the problem that we aim to 
handle. Then in Section IV we introduce our design of 
STC for travel speed estimation in detail. Next, we show 
our experimental evaluation results and give a discussion 
of the use of our method for future travel speed prediction 
in Section V. Finally, our conclusion and future work will 
be discussed in Section VI. 

II. Related Work 

We review existing works related to ours in following 
aspects: 

Vehicular Crowdsensing. There are works that examined 
the use of vehicles for certain applications. For example, 
Bauza et al. [7] proposed a vehicular multi-hop end- 
to-front local traffic congestion level detection strategy. 
Du et al. [8] aimed to effectively monitor congestion by 
fioating cars whose routes are optimized to sense data in 
a wide geographical range. Hu et al. [9] designed a role¬ 
transferring mechanism for on-board phones to optimize 
communication performance in vehicle-based participatory 
sensing applications. Thiagarajan et al. [10] proposed to 
track mobile phone locations to estimate road traffic delay, 
which was an energy-efficient way. Sun et al. [11] used 
cognitive radio vehicular networks to enable emergency 
communication between vehicles. Coric et al. [12] used 
crowdsensing to sketch the map of on-street parking spaces 
to help drivers to find parking positions. Chen et al. [13] 
took privacy into consideration while used participatory 
sensing to draw high quality city map. None of above 
works inspected travel speed estimation while it was done 
by Wang et al. [14] who only paid attention to freeways, 
rather than ordinary roads. And Aslam et al. [6] used taxis 
as probes to estimate traffic volume with the help of logistic 
regression and linear regression. But none of above existing 
works applied vehicular crowdsensing into the wide range 
travel speed calculation. 


Vacancy Completion Methods. The incompletion prob¬ 
lem is a critical problem in vehicle-collected data. To 
handle this problem, different works in the literature used 
various methods. For example, Alasmary et al. [15] used 
branch and bound approximation algorithm to select opti¬ 
mal number of vehicles for collecting data when wireless 
channel capacity is limited. And compressive sensing was 
considered by works such as [16] [17] [18] to collect and 
recover vehicle-related data. However, the computational 
complexity of compressive sensing is always high. Matrix 
completion was also used as an incomplete data estimation 
method in the work of Du et al. [8], which was also 
of high calculation cost. What’s more, we mainly focus 
on single-calculation-interval vacancies completion which 
is more like a single-column vector, rather than a multi- 
column matrix. Taking spatial correlation of roads into 
consideration, Zou et al. [19] proposed to use Kriging 
interpolation to complete the data where no detection 
equipment such as loop detector is deployed. Correlation 
among roads was better used by Aslam et al. [6] who 
introduced the similarity of roads. However, the temporal 
correlation among roads was overlooked by them. 

HI. Problem Formulation 

In this section, we first make clear the formalization of 
road net, based on which we state the problem of coverage 
incompletion according to taxi trace statistics. Then we 
explain the goal of travel speed calculation. 

A. Road Net Formalization 


(a) A sketch of road structure, (b) Graphical presentation of 
road structure. 

Fig. 1: Illustration of road net formalization. 

Fig. 1(a) is a sketch of a fraction of real road net. It 
consists of road segments and intersections, A road segment 
is a portion of road that has only one entrance and only 
one exit and vehicles are only allowed to run from this 
entrance to this exit under transportation regulations. An 
intersection is a point connecting adjacent road segments. 
For conciseness, we use directed graph to present the road 
net. A vertice in this graph represents an intersection in 
real road net while an edge represents a road segment. 
We use Q to represent the graph of the whole road net, 
IZ to represent the edge set of Q, and V to represent its 
vertice set. Fig. 1(b) is the graphical form of Fig. 1(a), where 
Ti G 7^, i = 1,2, • • • ,16. 

B. Problem of Coverage Incompletion in Vehicular Crowd¬ 
sensing 

As aforementioned, we use vehicular crowdsensing to 
collect traffic data. The data containing timestamp, longi¬ 
tude, latitude, velocity of a vehicle will be offered at will 
by this vehicle via wireless communication. We formally 
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call a piece of data in this form as a record. We examine 
real trace data of taxis in Shenzhen, China to first show 
the spatial gain of our approach and then show the still- 
existing spatial-temporal vacancies. There are 8363 taxis 
that participated in this work of data collection from April 
18th 00:00:00, 2011 to April 26th 00:00:00, 2011. Fig.2 
illustrates a snapshot of the spatial distribution of these 
taxis. Intuitively, we can see that the roads reached by taxis 
are not limited to main roads. 


CD 

"O 



Fig. 2: A snapshot of taxi distribution. 

However, given the length of calculation interval T, 
owing to the mobility of vehicles, the number of collected 
records that can be used to directly calculate the travel 
speeds for different road segments will vary in different 
intervals. For road segment G IZ, in calculation 
interval tj , we denote this number as riij . For trust consid¬ 
eration, only when riij is equal or bigger than a predefined 
threshold integer Nthr{T), we can calculate the travel speed 
for Ti directly. So we define. 

Definition 1: Road segment is covered in calcula¬ 
tion interval tj if riij ^ Nthr{T). 

Let Nr denote the total number of road segments in 
IZ. Based on our road net formalization, we have 1882 
road segments in the central area of Shenzhen, which gives 
Nr = 1882. Then we denote the number of road segments 
that are covered in tj as Nc. Fig.3 shows the variation of 
Nc with different T and 



Fig. 3: The variation of Nc when T changes from 10s to 120s 
with step 10s and Nthr changes from 1 to 10 with step 1. The 
begin time of this duration is fixed at 8:00:00 April 18th, 2011. 

From Fig.3, we can see that when the value of T is bigger 
and the value of Nthr is smaller, Nc becomes bigger. This 
trend is consistent with our expectation. Then we fix T, 
and to see the variation of Nc following the advance of 


time with different values of Nthr- The result is shown in 
Fig.4. As we can see, when Nthr increases, the value of 
Nc decreases. Also, the value of Nc has peaks and valleys. 
These phenomena show us the coverage incompletion of 
vehicular crowdsensing data. 



Fig. 4: The variation of Nc with the advance of time at fixed 
T = 60s and Nthr = 2, 3,4, 5 respectively. The time duration is 
from April 22th 00:00:00, 2011 to April 26th 00:00:00, 2011. 


IV. Our Design of STC 

In this section, we first give an overview of STC. Then 
we introduce our strategy for GPS map-matching. Next we 
explain our methods for single road travel speed calculation 
and global travel speed vacancy completion. 

A. Overview of STC 



Fig. 5: An overview of STC. 


The overview of STC is shown in Fig.5. In STC, every 
time a record is uploaded by vehicular crowdsensing, the 
GPS information will be extracted from the record and 
mapped to its corresponding road on map, where we use 
grid index of roads and vehicle tracking to accelerate 
map-matching. When a calculation interval ends, travel 
speeds of roads where vehicle data is available are cal¬ 
culated by combing instant vehicle speed that is extracted 
from uploaded records and vehicle tracking based average 
speeds. Coverage vacancies arise after this, so for the roads 
without vehicle data, spatial-temporal correlation among 
travel speeds is considered to fill up these vacancies, in 
which we make the determination of time lagging factor in 
cross correlation calculation self-adaptive. Then, we reduce 
the single-road travel speed vacancy completion problem to 
a minimization problem using the local-stationarity of cross 
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correlation. Finally, we fulfill global coverage completion 
in a recursive way. 

B. Map-Matching 

The goal of map-matching is to project a GPS location 
to a road segment accurately. There are two problems we 
have to handle. The first is the noise contained in GPS 
data, outliers for example. The second is that there is a 
large amount of roads as the candidates of a GPS location 
at the beginning of matching. 


Po(xo,yo) 



(b) Vehicle tracing based candidate roads se¬ 
lection. 


Fig. 6: Grid and vehicle tracing based map matching. 

To eliminate the noise in GPS data, we use a constrained 
shortest distance to find the fittest road segment for P. We 
give a minimum distance Dmin- If the distances between 
P and any segments are bigger than Dmin, we reckon that 
P is an outlier and we will drop it. 

To handle the candidate choosing problem, we first split 
the map into grids and index road segments to the grids. An 
illustration of this split is shown in Fig. 6 (a). Note that there 
are many roads that cross through more than one grids. 
Therefore, one road segment can belong to more than one 
grids. Assume that the coordinate of the point at the top 
left corner is Po(xo,^o)- When we need to locate a GPS 
location P{x,y), we first calculate the grid it belongs to. 
Assume this grid is g. Then we check the road segments 
indexed in g one by one to find the road segment that P 
belongs to. 

We can see that there is a tradeoff between grid size 
and storage cost. If the grid size is too small, it will 
take us too much space to store the information of index 
structure. And if gird size is relatively big, the number of 
roads we have to check seriatim also becomes big. In this 
circumstance, we propose a vehicle tracking based method 
to fulfill map-matching. Our key idea is that if the time 
difference between two successive locations of a vehicle is 
small, a vehicle can’t run for long distance. Assume that 


the vehicle now locates at road segment r, with the help of 
outward neighbors (an outward neighbor of a road segment 
r is a road segment that takes r’s exit as its own entrance), 
we can take the outward neighbors and recursively the 
outward neighbors of these outward neighbors as candidate 
road segments. Fig. 6 (b) shows an example of this strategy. 
The red point in Fig. 6 (b) is the last location of vehicle v, 
it locates on r^, so the first class candidate roads for the 
next location of v are the yellow ones, which are the direct 
outward neighbors of and the second class of that is the 
green ones which are the outward neighbors of the yellow 
ones. 

C. Travel Speed Calculation & Vacancy Completion 

Now we introduce the way we calculate travel speeds 
for the roads where vehicular data is available, measure 
spatial-temporal correlations and explain how to deal with 
travel speed vacancies. 

1) Travel Speed Calculation for Roads with Available 
Vehicular Data: 

At the end of current calculation interval, the accumu¬ 
lation of records for this interval is finished. So we can 
calculate travel speeds for road segments that are covered 
according to Definition 1. For later use, we define the 
network distance between two points Pi and P 2 as. 

Definition 2: dist{Pi^P 2 ) is the distance walking from 
Pi to P 2 along the shortest lawful driving path between 
them. 

Then we define. 

Definition 3: The central point of a road segment r, 
denoted as cp(r), is the point that at the central location of 
r, when walking along r from its entrance to exit. 



• Pi • P2 —► running direction 

Fig. 7: Road segments and vehicle locations. 

We take the situation shown in Fig.7 as an example to 
illustrate our calculation strategy. Assume the end time of 
current interval is te, so its begin time is ti, = te — T. 
Suppose that vehicle vi runs from location pi to location 
P 2 , then to p 3 , and vehicle V 2 runs from p 4 to ps, then 
to pq. The value Si is the instant speed of the vehicle at 
its corresponding location. Assume that and 

^ 2 p 3 p 5 p 6 ^ [tbPe), and there is no other vehicle on r 
during [4,4). 

We then calculate the two-phase average speeds of vi 
from Pi to p 3 as, 

= dist{pi,p2)/{t2 - 4),^ = dist{p2,p^)/{h - 4)- 

Similarly, the average speeds for V 2 from p 4 to pe are, 

JT; = dist(p4,P5)/(4 - 4),^ = dist{ps,p6)/{te - 4). 

Then, we give the average speed for r in this interval T 
as, 

S = {si^ + <Si2 + S2i + 522 + <^2 + 53 + 55 + Sq)/ (4 + 4), 
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where the first integer 4 in denominator is the number of 
average speeds calculated above, and the second 4 is that 
of speeds extracted from records that are right on r. 

As demonstrated, we composite the instant speeds re¬ 
ported by vehicles and the average speeds calculated by 
locations and timestamps to generate the average speed for 
r. This composition leverages both explicit and implicit 
information of vehicular data. 

2) Cross Correlation Function: 

The traffic features at upstream and downstream road 
segments are correlated. In our paper, this feature is travel 
speed. We use time-lagged cross correlation function to 
quantize this correlation. Cross correlation function mea¬ 
sures the inter-relevance between two vectors. We use Fig.7 
to explain our cross correlation calculation procedure. In 
Fig.7, ri,r 2 are upstream road segments to r, and rs is a 
downstream road segment from r. Denote the travel speed 
of ri at calculation interval as Xj. Suppose that on 
average, vehicles on ri spend k calculation intervals to run 
from the central point of ri to the central point of r. We call 
the number k as the time lagging factor. Then we denote 
yj^k as the travel speed of r at {j-\-kY^ calculation interval. 
We use X and Y to represent the random variable forms 
of X and y respectively. Then the cross correlation between 
X and Y is, 

c(X, Y) = , fc = 0, ±1, ±2, ±3, ■ ■ ■ (1) 

axCTY 

where ^xrik) = E[{xj - iix){y 3 +k - Fy)], with and 
jiy being the means of X and Y. (Jx^cty are the standard 
deviations of X and Y. 

A proper use of cross correlation function requires that 
the random variables X and Y satisfy stationarity condi¬ 
tion. In travel speed’s sense, it means that the travel speed 
of a road segment r fluctuates around a constant value. 
According to life experiences, we observe that for a road, 
its travel speed can’t satisfy stationarity for long time as 
a consequence of the variation of traffic pattern. However, 
travel speed can satisfy it within a short period of time. 
We denote the length of slide window as W. It contains 
integral number of calculation intervals, which is denoted 
as w = W/T. More convincingly, we show our statistical 
result of the travel speeds variation of a road in central city 
with different w in Fig. 8 . It shows the stationarity of travel 
speeds. 

3) Vehicle Tracking Based Self-adaptive Time Lagging 
Factor Determination: 

In order to calculate the travel speed cross correlation 
between two road segments accurately, the time lagging 
factor k between them must be determined properly. We 
denote the time lagging factor between two road segments 
u, r as ku,r- Note that during different time of the day, ku,r 
also differs. For example, at morning peak, ku,r is bigger 
than that of midnight due to traffic congestion at morning. 
Therefore, it is necessary to determine ku,r for different 
time adaptively. 

Novelly, we use the vehicle tracking to determine ku,r- 
By recording the locations of a vehicle at different times¬ 
tamps, we can obtain the time consumption for this vehicle 
to run from u to r. Then by tracking a certain number 



Fig. 8: The standard deviations of travel speeds for different length 
of w when time advanees. Here, we set T = 60s. We ean see 
that the standard deviations are relatively low. And these values 
first deerease and then inerease when w increases. 


of such vehicles, we can estimate ku,r in average sense. 
Specifically, we record the vehicles ever run on r in the 
time duration of W. We put these vehicles in set Vr. As 
the upstream of r, there are some vehicles in Vr that 
ran from u. By checking records, we select out vehicles 
satisfying this criterion and put them in set Vu^r- Note that 
the corresponding timestamps when vehicles in Vu^r on u 
may be prior to the beginning of W. For every vehicle v in 
Vu,r^ it has at least one record on u and at least one on r. 
For u, we select the record whose location is nearest to the 
central point of If more than two of them are selected, 
we further select the record whose timestamp is the biggest. 
We call this selection strategy as s {v^u). For r, we select 
the record whose location is nearest to the central point 
of r while its timestamp is the smallest one in multiple 
records which satisfy this criterion. We call this selection 
as s {v^r). With these two kinds of selection results of 
each V G 14,r, we then calculate average speed for v from 
u to r as, 

dist{loc{s {v,r))Joc{{s{v,u 
time{s" {v, r)) — time{s (v, u)) 

Then we calculate the travel time of v from cp(u) to cp(r) 
at the speed of avg(u,r,v), namely. 


avg{u^ r, v) 


travelT{y) = dist{u^r) / avg{u^r^v). 


With the values of travelT{vi), we have the estimation of 

^u,r ^^9 


ku,r = floor 


travelT{vi) 

\VuA*T 


where floor{') gives the value of the floor of a decimal. 

Therefore, through vehicle tracking, we get the estima¬ 
tion of the time lagging factor between two road seg¬ 
ments during W. To show the feasibility of this estima¬ 
tion method, we illustrate the cross correlation calculation 
results between two roads that are selected in central city 
for our ku,r and predefined time lagging factors in Fig.9. It 
shows that compared with predefined time lagging factors, 
the ku,r determined by our method gives more evident cross 
correlation values. 

However, it is noteworthy that the most proper value of 
k does not necessarily correspond to the highest value of 
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Fig. 9: The comparison of the values of ku,r between our 
estimation method and the predefined values {i.e. 0-5). These 
roads ri, r 2 , rs, r 4 and r are selected in central city of Shenzhen. 
The values of ks corresponding to blue stars in this figure are 
the estimated time lagging factor values by our method. It can be 
seen that our method usually estimates the most proper ks that 
corresponding to highest cross correlation values. The value of T 
is 60s, and the value of w is set to 10. The time expansion of this 
figure is from 8:00:00 to 9:00:00 in April 18th 2011. 


vacancy of the current travel speed for road r. Denote the 
area in which the roads we will consider to help to calculate 
the travel speed for r as and denote the upstream roads 
of road r within as Rr = r 2 , rs, • • • }. We then 

calculate the cross correlations between every and r. 

Denote the travel speed vector for from the 1st to 
the interval in T as and that for r as As 
introduced in section IV-C3, we can estimate the time 
lagging factor from Vi to r. Then for any ri e Rr, the cross 
correlation between and r during last W which starts at 
{n — wY^ and ends at (n — 1)^^ calculation interval can be 
calculated as, 

Cpre (r^, r) = 

c(X^. (n — kr^^r — W^n — kr^^r ~ 1), — W^Tl — 1)). 

( 2 ) 

Fig. 11 illustrates the interval selection for the calculation 

of Cprein^r). 


I travel speed 



now 


Fig. 10: An explanation for the choice of k. The travel speed 
value in purple (surrounded by a circle) in Fig. 10 needs to be 
estimated. Here w — 7. So, if we search for the travel speed value 
sequence in n’s values which has the biggest cross correlation 
with the blue ones in r’s values, we will locate to the red sequence. 
Then the purple value will be estimated in reference of the value 
that surrounded by a rectangle. However, the rectangle one has 
increased from the last red one, while the purple one is lower than 
the last blue one. So their trends are not the same. Differently, 
though the green sequence has a lower cross correlation with the 
blue one, the value in yellow (surrounded by a triangle) decreases 
after the last green value, which is similar to the trend of the purple 
one. 


cross correlation. Fig. 10 illustrates the explanation for this. 
This is why we don’t use a traversal way to find the lagging 
time corresponding to the highest cross correlation as the 
k we need. 

4) Travel Speeds Estimation for Roads Where Vehicular 
Data Is Unavailable: 

(1) A Minimization Problem for Single-Road Vacancy 

In section IV-Cl, we showed our method of travel speed 
calculation for roads where data for current calculation 
interval is available. Now we handle single-road vacancy 
by using cross correlation function. First, we define, 

Definition 4: X{indexifindex 2 ) is the sub-vector of 
vector X from the subscript indexi to the subscript 
index 2 . And X {index) is the item at the location index 
which begins from 1. For example, if X = [1,3, 5, 7,9], 
then X(2,3) gives [3, 5], and X(3) gives 5. 

Suppose that current calculation interval is the n^^ one 
from the beginning of T, and now we want to fill the 


now 

time 


t 


ri r 2 rs ... r 



- current interval 



travel speed to be estimated 

has travel speed 

travel speed that participates 
in cross correlation calculation 

travel speed in lagged interval 


Fig. 11: The interval selection to calculate Cpre{ri,r). Here 
kr^jV — 3, kr2,r — 3, k'rsjT — 2, k'p^^r — 0* 


Similarly, the cross correlation between and r during 
W which starts at (n —+ and ends at n^^ calculation 
interval can be expressed as, 

^now (^2 5 ^) - 

c(X^. {n — kr^^r — tu -h 1, n — kr^,r)^ Xr{n — w 1, n)). 

(3) 

We illustrate the interval selection of the calculation of 
Cnowin^r) in Fig. 12. It is noteworthy that enow{ri,r) 
contains the unknown quantity X^(n). 



j- current interval 


- w 


Ml travel speed to be estimated 


□ 

□ 


B 


has travel speed 

travel speed that partieipates 
in eross eorrelation ealeulation 

travel speed in lagged interval 


A ^2 rs ... r^n r 


Fig. 12: The interval selection to calculate Cnow{ri,r). 


Let vector Cpre(r) = [cpre{ri,r),Cpre(r 2 ,r),-■ ■] de¬ 
note the values of Cprepi, r) between every n and r. And 
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Time 


(a) The changes of cross correlations between different roads and r when 
time advances. We can see that the cross correlation value between road 
pairs fluctuates in a small range. 


where Yr{w) = Xr{n). Then / can be further formulated 
as, 

m 

/ = E 

i=l 

We then let 

gi{Xr{n)) = E[{yr - yY^){yri - Mr,,)], 

and 

h{Xr{n)) = cry^cry-^., 

where Xr{n) exists in i/r, /ry^ and cry^. So / can be shown 
in a simpler way as, 

m 2 

/ = E • 


E[{yr - fJ‘Yr){yri - ^Yr, )] 


(JYr^Yr 


t{r, n) 



(b) The mean value and standard deviations of the cross correlations shown 
in Fig. 13(a). It can be seen that the standard deviations are small, which 
indicates the stationarity of cross correlations. 

Fig. 13: The illustration of the stationarity of travel speeds cross 
correlation between roads. 


put the values of Cnow(Ti,r) between every and r in 
vector Cnow{r) = [c now (ri 5 t) , CfiQyg {r2,r), ■■■]. 

We observe that the cross correlation of travel speeds 
between roads also meet stationarity demand, which is 
shown in Fig. 13 as an example. So we can estimate the 
value of the current travel speed Xr{n) for r by minimizing 
the differences between Cpre{'^) and Cnow{i^), namely, 

obj = min \\Cnow{r) - Cpre(r)\\„, (4) 

Xr(n) 


obj = min < W [cnow{r,ri) - Cpre{r,ri)f‘ 

XAn) 

where m is the number of r^s. We then let 

m 

f {Xr (ff)) — ^ ^ [Cnow (t, 7*2 ) Cp^e (t, T^)] , 
i=l 

SO the minimization of ohj is equal to the minimization of 
f{Xr{n)). For the simplicity of presentation, we let = 

(n ——ic + 1, n — kr^^r), and —ic + l, n). 



And the derivative of / in terms of Xr (n) is, 

m 

k = (^pre{r,ri) 

i=l ' ■ ^ ’"i 

Then by letting / = 0, the value of Xr{n) at which ohj 
reaches its minimum can be obtained. 

(2) A Recursive Way for Global Vacancies Completion: 

Having the method for single road travel speed vacancy 
completion, we now fill all the vacancies of roads by using 
it in a recursive way. 

As aforementioned, we use Ar to limit the number of 
upstream road segments that help to calculate the cur¬ 
rent travel speed for r. More specifically, we use limited 
backtracking method to fulfill this aim. We set a distance 
threshold by combing network distance distQ and the 
intersection distance. The intersection distance distl{u^r) 
between road segment u and r is the number of intersec¬ 
tions between them along the path of their network distance 
dist{u,r). For example, distl{ri,rs) in Fig.7 is three. 
We use the product of intersection distance and network 
distance to limit Ar, Dist{u, r) = dist{u, r) x distl{u, r). 
Only when Dist{ri,r) ^ the upstream road segment 
Vi should be taken into consideration to calculate the travel 
speed for r. 

The use of cross correlation function is based on the 
assumption that the current interval travel speed values for 
every Vi G Rr are available, but in reality it is inevitable 
that some of them are also empty. Note that when traffic 
is under low pressure, the sparsity of vehicular data also 
becomes serious. So we define a threshold Nmin- When the 
number of road segments having travel speeds contained in 
Rr, denoted as , satisfies ^ X^in, we stipulate 
that the calculation for r can be continued, which means 
that we only use the roads having travel speeds in Rr to 
calculate the travel speed for r. We only calculate the cross 
correlations between them and r. When Xp satisfies above 

jAj'p 

criterion, we say that r is calculable. 

Now, we use a recursive algorithm to fill global travel 
speed vacancies. We start from a road segment r which has 
no travel speed for current interval. Then we check each 
road segment G Rr one by one from vicinity to remote. 
If r is calculable, we simply calculate its travel speed with 
the help of Rr. If Rr contains any Vi that does not have 




































travel speed, we put them in R^. For every Vi G we 
take it as a road whose travel speed need to be filled one 
by one also from vicinity to remote, and check and 
R^. for it. We show the whole algorithms for travel speeds 
completion in Alg.l and Alg.2. 


Algorithm 1 TS_All{lZ)\ Travel speeds calculation of IZ. 
Input: The edge set IZ of Q. 

Output: The estimated travel speeds for every r E IZ. 
for each r G 7^ do 

if Xr{n) is not calculated then 

if r has records for current interval then 

Calculate Xr{n) using method introduced in section 
IV-Cl; 
else 

call TS{r); 

end if 
end if 
end for 


Algorithm 2 TS{r)\ Travel speed filling-up for r. 
Input: Road segment r. 

Output: The estimated travel speed Xr{n) for r. 
stack S' ^ 0; 
if r is calculable then 
calculate Xr{n)\ 
else 

push{S, r); 
for each Vi G Rr do 
call TS{ri); 

end for 

calculate Xr{n); 
pop{S,r); 

end if 


(3) Initialization of Our Algorithm 

Note that our estimation method requires that at the 
interval, the travel speeds of past w — 1 calculation intervals 
are all available for every r e 7Z. So, when n < re, we 
have to fill up the vacancies in another way. We do this by 
averaging the history values. If no value was calculated in 
history, we set the value as a properly high speed because 
no data always means little number of vehicles running on 
road, which usually means a good traffic condition. This is 
the initialization procedure of our strategy. 

(4) Parallelization of Our Algorithm 

With the purpose of accelerating the calculation speed 
of global travel speeds completion using our recursive 
algorithm, we can divide the whole area of city to several 
un-overlapped regions whose size are similar. Then in 
each of these regions, we choose a road as its starting 
location of recursive algorithm. By doing so, our method 
can be executed in a parallelized way. The feasibility of this 
parallelization is guaranteed by the wide spatial distribution 
of vehicles. 


V. Experimental Evaluation 

In this section, we illuminate the superiority of our 
travel speed calculation method by comparing its estimation 
errors with classical methods. 


A. Estimation Error Measurement 


Suppose that S = [si, S 2 , • • • ] contains the ground truth 
values of travel speeds for every road segment in 7Z, and 
S = [si, S 2 r 'contains the estimated values of these 
travel speeds. The relative error of 5 to S is the relative 
difference between them. 


£(5,5) 




B. Travel Speed Estimation Results 


As introduced, the calculation of estimation error needs 
ground truth. The real ground truth of travel speeds for 
roads is unavailable. So we use cross validation instead. 
We set the ground truth by picking up an experiment time 
duration in which the coverage ratio of roads is high. 
Then we randomly hide some travel speeds for some roads 
with a predefined missing ratio. Then we recover these 
missing(hidden) values by our completion strategy and 
comparative methods and then calculate their estimation 
errors respectively. Based on experimental analysis, the 
experiment duration we select is 7:00:00-16:00:00 April 
24th, 2011, which averagely has 1512 out of 1882 roads 
covered by vehicles during every interval when Nt^r = 2 . 
There are two reasons why we take 2 as the value of Nt^r^ 
Firstly, the sampling data collected by probe taxis is so 
coarse that the number of taxis running through a road 
segment is not big for a relative short interval, as we have 
shown in section III-B. Secondly, if we take 1 as h 

is not representative enough to implement our methods for 
the travel speed and time lagging factor calculation. 

Before comparison, we should determine important pa¬ 
rameters T and w because different combinations of their 
values give quite different estimation errors. We determine 
them by averaging the estimation errors corresponding to 
ten randomly chosen separated hours from the whole time 
expansion in data set. Fig. 14 shows the average errors at 
different T and w when the missing ratio is 0.2. It can 
be seen that when T = 8 O 5 and w = 12, the average 
estimation error is the lowest. Other missing ratios give 
similar suggestions. 


Fig. 14: We randomly choose ten separated hours from the whole 
time expansion and calculate the average estimation error when 
the missing ratio is 0.2. Here, dA = 2000m, Nmin = 4. 

Now we introduce the estimation methods that we will 
compare with. 
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•Kriging is a spatial interpolation method that builds 
variation function model according to the spatial location 
relation among data [19] [20]. Based on this variation 
function, Kriging interpolates the missing data using the 
correlation between the locations that have and not have 
data. It is a well-known data recover method but it mainly 
focuses on spatial relations of data at different locations. 

•KNN uses the weighted average values of the K nearest 
locations to the location whose data to be estimated as the 
estimation value. Here we take the inverse of distance as 
weighting basis. The nearer is a location to the location to 
be estimated, the bigger its weight will be. And here we 
use 4 as the number A^, as is always used in the literature. 

•ARIMA (Autoregressive Integrated Moving Average 
Model) is a famous time series analysis based prediction 
method that regresses the dependent variable with regard to 
its lagging value and the present as well as lagged values 
of random error to establish models. It mainly considers 
the temporal relations among data [21]. Here we use the 
values in past w — 1 intervals to estimate the values in 
interval. 

Fig. 15 shows the estimation errors of STC, Kriging, 
KNN, and ARIMA for every calculation interval in the 
experiment duration. The missing ratio in Fig. 15 is set to 
0.2. We can see that STC makes the lowest errors at most of 
the intervals. And to observe the robustness of STC when 
the missing ratio varies, we show the average estimation 
error of travel speeds of all the intervals in the experimental 
duration in Fig. 16. We can see that when the missing ratio 
is smaller than 0.7, STC always get the lowest errors among 
the four algorithms. 



Missing Ratio 


Fig. 16: The average errors for different missing ratios when 
interval T = 80s, w — 12. 


C. An Extension for Future Travel Speed Prediction 

Besides current interval travel speed calculation, predict¬ 
ing future travel speed also has great meaning for trip 
planning, transportation management, etc.. Our method can 
also be used for travel speed prediction for next calculation 
interval with a modification. 

1) Our Method for Predicting Future Travel Speeds: 
Different from current interval travel speed estimation, the 
future travel speed prediction is based on the condition that 
all the values for current calculation interval is calculated 
but none of travel speeds for next interval is available. In 


this situation, we can predict travel values by a weighted 
averaging method. 

Since no travel speed information in next interval is 
available, and the cross correlation function calculated only 
measures the correlation till current interval, to predict 
travel speed in next interval, we can only use the roads from 
which the lagging factor to r is bigger than 0. We pick up 
these road segments from R and denote the set containing 
them as Rq. For every G we have its corresponding 
The value indicates the coefficient 

of determination that we can estimate the values of travel 
speeds for r using the travel speed values of . The weight 
we are about to use is, 

i=\Ro\ 

uj{ri,r) = c^now{ri,r)/ c^now{ri,r) 

i=l 

We then use linear regression to get the relationship 
between the travel speeds of and r. The regression result 
can be formulated as Xr = OiX^ + bi. Therefore, the 
predicted travel speed of r for next interval is, 

i=\Ro\ 

X+{n+l) = E ((tti xXr; (n-fcri,r+l)+&i) X w(ri, r)). 

i=l 

( 6 ) 

Fig. 17 shows our interval selection for prediction. 
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Fig. 17: Interval selection for travel speed prediction. 


2) Future Travel Speeds Prediction Results: 

In future travel speeds prediction, we can use the next 
interval estimation results calculated using next interval 
vehicular data as ground truth. However, we still should 
determine the parameters T and w first. And similarly, 
we randomly choose ten separated hours in the experiment 
duration and change the values of T and w. Fig. 18 shows 
the results of this procedure. We can see that STC gets the 
lowest error when T = 90s, and w = 13. 

Then we compare the the prediction error of STC with 
two classical prediction methods: 

•KF(Kalman Filter) is a theoretically optimal data pro¬ 
cessing algorithm [14] [22] [23]. It combines the values 
of last condition time and the current condition value and 
minimizes the mean of squared error. It can be used to 
predict system condition with favorable accuracy. 

•ARIMA which was introduced before. In the literature, 
e.g. Voort et al. [21] used ARIMA time series models to 
forecast traffic flow. 
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Fig. 15: The comparison of filling-up errors among four estimation methods during 7:00-16:00 April 24th, 2011. The missing ratio is 
0.2. And T = 80s, w = 12. 




Fig. 18: The average prediction errors of ten randomly chosen 
hours at different T and w when the missing ratio is 0.2. 

The comparison result is shown in Fig. 19 from which 
we can see that among three methods, our method always 
gives lowest errors in all the calculation intervals. 

VI. Conclusion And Future Work 

In this paper, we propose a new vehicular crowdsensing 
data based travel speed calculation strategy called STC to 
estimate travel speeds. We use cross correlation to measure 
the spatial-temporal correlations of travel speeds among 
different road segments. When calculate cross correlation, 
we novelly use vehicle tracking to self-adaptively determine 
the time lagging factors of travel speed diffusion between 
roads. We settle the problem of single-road travel speed 
vacancy completion via reducing it to a minimization prob¬ 


lem, using the local-stationarity of cross correlation. Then 
we fill up all the vacancies in a recursive way by utilizing 
the geometrical structure of road net. Experiments based 
on real vehicle data show that our strategy estimates travel 
speeds with lowest errors when compared with classical 
methods. And also, we show that our method can be easily 
transfered to predict travel speed for future calculation 
interval. In the future, we will pay more attention to 
the issues of wireless communication cost (e.g. bandwidth 
consumption) saving and privacy protection in our design. 
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